Social Learning in One-Arm Bandit Problems
نویسندگان
چکیده
منابع مشابه
Social Learning in One-arm Bandit Problems
The copyright to this Article is held by the Econometric Society. It may be downloaded, printed and reproduced only for educational or research purposes, including use in course packs. No downloading or copying may be done for any commercial purpose without the explicit permission of the Econometric Society. For such commercial purposes contact the Office of the Econometric Society (contact inf...
متن کاملOn Robust Arm-Acquiring Bandit Problems
In the classical multi-armed bandit problem, at each stage, the player has to choose one from N given projects (arms) to generate a reward depending on the arm played and its current state. The state process of each arm is modeled by a Markov chain and the transition probability is priorly known. The goal of the player is to maximize the expected total reward. One variant of the problem, the so...
متن کاملBandit Problems and Online Learning
In this section, we consider problems related to the topic of online learning. In particular, we are interested in problems where data is made available sequentially, and decisions must be made or actions taken based on the data currently available. This is to be contrasted with many problems in optimization and model fitting, where the data under consideration is available at the start. Furthe...
متن کاملQ-Learning for Bandit Problems
Multi-armed bandits may be viewed as decompositionally-structured Markov decision processes (MDP's) with potentially very large state sets. A particularly elegant methodology for computing optimal policies was developed over twenty ago by Gittins Gittins & Jones, 1974]. Gittins' approach reduces the problem of nding optimal policies for the original MDP to a sequence of low-dimensional stopping...
متن کاملLocal Bandit Approximation for Optimal Learning Problems
In general, procedures for determining Bayes-optimal adaptive controls for Markov decision processes (MDP's) require a prohibitive amount of computation-the optimal learning problem is intractable. This paper proposes an approximate approach in which bandit processes are used to model, in a certain "local" sense, a given MDP. Bandit processes constitute an important subclass of MDP's, and have ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Econometrica
سال: 2007
ISSN: 0012-9682,1468-0262
DOI: 10.1111/j.1468-0262.2007.00807.x